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The CDF collaboration recently reported an upper limit on boosted top pair production and 
noted a significant excess above the estimated background of events with two ultra-massive boosted 
jets. We discuss the interpretation of the measurement and its fundamental implications. In case 
new physics is involved, the most naive contribution is from a new particle produced with a cross 
section that is a few times higher than that of the top quark and a sizable hadronic branching ratio. 
We quantify the resulting tension of a possible larger top pair cross section with the absence of 
excess found in events with one massive boosted jet and missing energy. The measured planar flow 
distribution shows deviation from CDF's Pythia QCD prediction at high planarity, while we find a 
somewhat smaller deviation when comparing with other Monte Carlo tools. As a simple toy model, 
we analyze the case of a light gluino with R-parity violation and show that it can be made consistent 
with the data. 



Introduction. New physics searches at colliders typi- 
cally focus on signals with leptons and/or missing energy. 
Recently, there has been some interest in extending the 
hunt to include particles that decay only to quarks and 
gluons (see e.g. [TJ |5] for some theoretical studies), as 
was done in an analysis by CDF [3]. In this analysis the 
focus was on supersymmetry (SUSY) with R-parity vio- 
lation (RPV), where a light gluino decays to three quarks. 
However, this results in a multi-jet signal, which makes 
it challenging to distinguish from the QCD background. 
Indeed, it was found in [3] that the current sensitivity 
is far below the expected signal, thus it is not useful for 
obtaining a bound on the parameter space of SUSY (or 
any alternative theory which would produce this type of 
a signal). 

Progress has been recently achieved in another CDF 
study by restricting the data sample to include only high 
transverse momentum (pt) and high mass jets [H [5], 
thus reducing the QCD background much more than the 
signal and increasing the sensitivity (as was anticipated 
in [5]). The idea is that the decay products of a highly 
boosted massive object would collimate to a single jet 
in the detector. While the data is still dominated by 
the QCD background, it has much larger discrimination 
power. Moreover, it is possible to use various jet sub- 
structure analysis techniques to further improve the ef- 
ficiency. Applying this approach enabled to obtain the 
strongest existing bound on the cross section for the pro- 
duction of a (high-p-p) top pair, even without relying on 
substructure analysis. 

The CDF study focused on events including two 
boosted jets (px > 400 GeV for the leading jet) with 
mass close to the top mass (130-210 GeV) and pseudo- 
rapidity rj < 0.7 (to be precise, an 77 cut was applied 
only for the leading jet, but it was found that the second 
jet admitted a similarly bounded 77 value) [H |5]. The 
jet algorithms used are Midpoint and anti-/cr [S] with 
R = 1.0 (i? = 0.7 was also checked), which were in excel- 
lent agreement. As discussed below, the estimation of the 
background depends on a parameter i? mass (see Eq. Q). 



Using data sample of 5.95 fb 1 and assuming i? m ass = 1 , 
the standard model (SM) expected number of events is 

QCD| fl =1 : 13 ±2.4 (stat.) ± 3.9 (syst.) , 

tt: 3.0 ±0.8. ^ 

The number of observed events was 32 [5], which consti- 
tutes a deviation of 3.44 standard deviations (a) from the 
above expectation. In order to translate this to cross sec- 
tion, we perform the following exercise. The SM NNLO 
cross section for tt production with pt > 400 GeV is 
4.5 fb 0E]. Multiplying this by a branching ratio of 4/9 
for hadronic tops, we get 2 fb, which corresponds to the 
3 events reported in Eq. ([I]) . Thus the difference between 
the 32 observed events and the mean value of Eq. ([I]) is 
translated to a cross section of 

^excess ~ (11 ±3.2) fb. (2) 

This is the excess found in [5] in terms of hadronic top- 
equivalent cross section, under the assumption that the 
signal cannot be accounted for within the SM. The data 
can also be used to provide an upper bound on the all 
hadronic top pair production cross section, which is given 
by 20 fb at 95% confidence level [5]. 

The evaluation of the QCD background in Eq. ([!]) was 
done in the following way. The search was divided into 
four different regions in terms of the jet masses. Region A 
corresponds to events with two "light" jets, with masses 
in the range of 30-50 GeV. Regions B and C are for one 
massive jet (130-210 GeV) and one light jet, depending 
on which is the leading jet in terms of pt- Finally, re- 
gion D corresponds to two massive jets. There are three 
basic assumptions involved: i) all the events in regions 
A-C come only from QCD; ii) the actual cross section can 
be factorized into the partonic cross section, which only 
weakly depends on the masses of the final states, and the 
jet and soft functions [S]; iii) the masses of the leading 
and sub-leading jets are largely uncorrclatcd variables for 
QCD jet production, and the correlation cancels in the 
ratio R mass described below. Under these assumptions, 
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we have 



n B n c 
n A n D 



(3) 



where nx is the number of events in region X. One can 
therefore estimate the number of QCD events in region 
D by nsnc /nA- The result of this calculation is the one 
given m Eq. (I) for QCD. Below we test this estimation 
in more detail. 

The CDF study [5] used another search channel, in- 
cluding one jet (with px > 400 GeV and mass 130- 
210 GeV) plus missing energy (with missing Ef signif- 
icance between 4 and 10 - see definition in [5]). In the 
context of ti production, this corresponds to events with 
one top decaying hadronically and the other semilepton- 
ically. Note that this type of measurement suffers from 
a lower signal to background ratio, since there are large 
fluctuations in the jet energy scale, which make the es- 
timation of the missing energy noisy (see Fig. 10 in [1], 
where there are long tails for both the ti and QCD miss- 
ing transverse energy significance distributions). The 
total number of events observed in both channels is 
58, the estimated QCD background (for i? mass = 1) is 
44 ± 8.4 (stat.) ± 13 (syst.) and the ti background is 4.9. 
This leads to an upper bound of 40 fb at 95% confidence 
level on the ti production cross section for top quark 
p T > 400 GeV. 

Another result given in [3] is the planar flow (Pf ) dis- 
tribution [TO] (see also [H]). This jet substructure 
variable distinguishes between a linear deposition of the 
energy inside the jet, favored by QCD processes (giv- 
ing values close to for Pf), and a planar one (that is, 
Pf close to 1), produced by the 3-body decay of a top 
quark. The plot given in [3] shows that in the data there 
are more events with high Pf values than predicted for 
QCD alone. 

Model Independent Interpretation. The excess 
of events with two ultra-massive boosted jets hints for 
a contribution which is characterized by a mass scale 
around the top one. This new source of massive jets 
should be produced with a cross section bigger than that 
of the SM hadronic ti by a factor of roughly 5 (about 
11 fb in the signal region, as in Eq. (|2j), but not more than 
20 fb) and a dominant branching ratio for a fully hadronic 
decay. In order to have significant acceptance under the 
search criteria, the production should be mostly central, 
that is with 77 < 0.7 for both jets. Furthermore, if it is due 
to the decay of a massive particle, the collimation rate, 
which is the fraction of decays where the daughter parti- 
cles collimate into a single jet, must be high, e.g. similar 
to that of the top (~ 0.5 [TO]). 

The simplest explanation of this excess would be an un- 
derestimation of the QCD production strength (no mas- 
sive particle involved). As described above, the existence 
of an excess was established based on an estimation of the 
QCD background in the signal region D, without relying 



on Monte Carlo (MC) simulations. In this estimation, it 
was assumed that the dependence of the partonic cross 
section on the outgoing particles' virtuality (jet mass) is 
negligible 1 , as mentioned in assumption ii above. To es- 
timate the significance of this effect, we calculated the 
leading order partonic cross section 2 for each of the re- 
gions of jets masses A-D (denoted as ax for region X) 
with the virtuality of the particle representing each jet 



ox 



dp T dy 2p T 



dxi 



fi(xi,Q 2 )fj(x 2 ,Q 2 )<Ti. 
X\S + u — m 2 



(4) 

where m is the mass of the jet whose rapidity is y , s and 
u are Mandelstam variable of the pp system, is the un- 
derlying partonic cross section and ft is the PDF at mo- 
mentum fraction x and energy Q . The relation between 
x\ and X2 and their integration range are determined by 
the kinematics (see e.g. [12 ). Now the number of events 
nx is proportional to ax times the jet mass functions 
(still neglecting any jet correlations, as mentioned in as- 
sumption hi above). Since only the latter part factorizes, 
the ratio of events n B nc In a used to estimate no should 
be corrected as follows: 



n B n c a A a D 

n D = x 

tia vb°c 



(5) 



We found that this correction raises the estimated QCD 
background by only about 5% in the given jet mass win- 
dow 3 . This substantiates the reliability of the result 
of 0113. 

One possible caveat in this argument is that assump- 
tion iii above could turn out to be wrong. If there is some 
mechanism in QCD which leads to bias towards two mas- 
sive jets (relative to the evaluation used in [HE]), then it 
might be that the excess of events in region D is simply 
the consequence of underestimating the QCD contribu- 
tion. 

The relation in Eq. ([3| is examined by MC simula- 
tions in [TO]. The results from different MC tools are 
shown in Table [I] From this we learn that: i) the devi- 
ations from -Rmass = 1 are small (within the systematic 
uncertainties); ii) the matched MC results, which include 
(jj+jjj+jjjj) and are expected to better estimate the QCD 
jet mass distribution at large masses, are in very good 
agreement with each other (even though they tend not 
to agree on the individual jet mass distribution [TO]), giv- 
ing < a c ss ~ 0.87. 



1 We are grateful to Steve Ellis who questioned this assumption. 

2 For the parton distribution functions (PDF), we used the CTEQ5 
Mathematica implementation from http://www.phys.psu.edu/ 
~cteq/| 

We found no significant sensitivity to interchanging between 
CTEQ5M and CTEQ5L and to multiplying or dividing the en- 
ergy scale by 2 1 / 4 . 
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MC tool 


Matching 


-Rmass 


Sherpa 


Yes 


0.88 ± 0.03 


MadGraph 


Yes 


0.86 ±0.04 


MadGraph 


No 


0.76 ±0.04 


Herwig 


No 


0.86 ±0.02 



TABLE I: The results for 7? ma ss (borrowed from [13]) from 
different MC tools: Sherpa (1.2.3) [H] with matching, Mad- 
Graph/MadEvent 4.4.56 [it] with MLM matching [To] to 
the Pythia package 2.1.4 [17], MadGraph/MadEvent with no 
matching and Herwig 6.520 [IS] with no matching. The PDF 
set used was CTEQ6M [19], and FastJet 2.4.2 [20] with anti- 
kt algorithm 6 (A_R = 1) was used for jet clustering. Quoted 
errors are statistical only. 



The other possible explanation would be that the ex- 
cess is related to non-SM production of boosted top pairs. 
A relevant aspect of the CDF data is that no excess 
was found compared to the SM in the channel with one 
jet plus missing energy described above. However, this 
channel suffers from larger uncertainties, as already men- 
tioned. 

In the following exercise we estimate the tension in case 
the hadronic excess is completely accounted for by tops. 
Adding 16 hadronic top events, the expected scmilcptonic 
sample (since including r's the ratio is the same) would 
be 



31 ± 1.9 + 16 x (1.9/3) « 43. 



(6) 



where 31 is the expected number of QCD events (esti- 
mated as before using the ratio nBn c /nA), 1.9 is the 
number of expected hadronic-semileptonic top events, 
and thus (1.9/3) is the ratio of acceptance of this sam- 
ple to the fully hadronic one, based on the estimation 
in [5]. This constitutes an excess of 17 compared to the 
observed 26 events [5J- The statistical uncertainty in- 
volved is 8.1 events, while the systematics from the jet 
energy scale and jet mass measurements is 30% of the 
original 31 expected events. These are combined to a 
standard deviation of 12 events, which means that the 
tension with the semileptonic sample is at the level of 
17/12 = 1.4ct. Thus we conclude that while a pure top 
excess is not perfectly consistent with the data, it is far 
from being disfavored. 

Further motivation for an excess of boosted tops origi- 
nates from the possible relation with the measurement of 
forward-backward asymmetry in tt production [21] and 
specifically the large deviation recently observed by CDF 
at high invariant masses [22]. This issue is investigated 
in detail in [T31.I23]. 

Finally, it is possible that the data hints for a presence 
of new massive particles with a large production cross 
section and hadronic final states. Standard hadronic 
top searches include b-tagging as a necessary condition. 



Since these show good agreement with the SM predic- 
tion [24] , the existence of a new particle which decays to 
a bottom is probably disfavored, unless this state would 
only be produced with a high boost, where these searches 
would fail [2"5] . 

Regarding the planar flow distribution, it is interesting 
to note that a sizable excess for Pf > 0.4, relative to the 
Pythia prediction, was found in pQ. This might motivate 
a search for particles with 3-body (or higher) decays ef- 
fectively (for this purpose, the top's decay is considered 
as 3-body). 

In order to investigate this issue, we used different MC 
tools to estimate the QCD Pf distribution in the relevant 
search window. The first is Herwig 6.520 with the PDF 
set CTEQ6L. The second is Madgraph/MadEvent 4.4.51 
with the Pythia 2.1.4 package and the same PDF set, 
with and without MLM matching. We also used Pythia 
6.4 by itself. All MCs were interfaced to FASTJET 2.4.2 
for jet clustering. The cuts used are the same as in the 
CDF study (excluding the rj cut, which was found to have 
a negligible effect). The result is shown in Fig.[l] together 
with the recent CDF data. It is evident that the three 
simulations that we use exhibit good agreement with each 
other, and furthermore that their resulting distributions 
are closer to the data than the Pythia one in [4]. Note 
also that reasonable agreement was found between the 
predictions of MadGraph/Pythia and Sherpa in [10] . 



• CDF data — Herwig 
--Pythia MG /Pythia 




0.4 0.5 0.6 

Planar flow 

FIG. 1: QCD planar flow distribution (normalized to unit 
area) calculated by different MC tools compared to the CDF 
data with the anti-fey jet algorithm (R=1.0) [1]. The data is 
represented by orange circles with error bars, while the solid 
blue, dashed red and dotted green lines correspond to Herwig, 
Pythia and MadGraph with Pythia including MLM matching, 
respectively. 



We further demonstrate that a contribution from par- 
ticles with 3-body decays favor higher Pf values, such 
that a proper combination with the QCD prediction can 
yield a better agreement with the data. In Fig. [2] we show 
the distribution generated by an RPV light gluino (see 
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FIG. 2: Planar flow distribution of an RPV gluino decay 
(normalized to unit area) calculated by different MC tools 
compared to the CDF data with the anti-fcr jet algorithm 
(R=1.0) [4j. The data is represented by orange circles. The 
solid light blue (dashed red) and dashed-dotted purple (dotted 
blue) lines correspond to a particle level (partonic level) simu- 
lation using Herwig and MadGraph with Pythia, respectively. 
The short-dashed green line is for a hadronic top distribution, 
borrowed from [4] . 



below), separating between runs that include only a par- 
tonic decay to three quarks and runs with showering and 
hadronization (we do not combine the QCD contribution 
here). Additionally, the figure presents the Pf distribu- 
tion of a hadronic top quark, borrowed from [3] . 

As an exercise, we calculated the Pf distribution of a 
toy model where a heavy scalar decays to three massless 
scalars. The decay was computed analytically, and the Pf 
distribution was obtained by random generation of events 
admitting the proper kinematics. It is interesting to men- 
tion that the resulting curve is in perfect agreement with 
the MG/Pythia partonic case, while if we add the proper 
matrix element of the decay to this "random" model, we 
find perfect agreement with the Herwig partonic curve. 

We note that given the large uncertainties on the data, 
it does not seem instructive to make any quantitative 
comparisons of the Pf distributions in the two figures. 
At this stage, both QCD and 3-body decaying particles 
provide reasonable fits to the data. We expect that in 
the near future, when LHC data is available, it would 
be possible to make a distinction between the different 
cases pfTT], 

Toy Model. In order to demonstrate a toy model 
that can account for the observed excess, we consider an 
RPV gluino in the context of SUSY, where the rest of 
the sparticles are decoupled for simplicity (In principle, 
there could be interference effects in gluino production 
from squarks, but this is highly model dependent). The 
gluino decays to three quarks, hence in case its mass is 
inside the window used in the search, it would lead to an 



excess of events with boosted jets [2J. 

Such a scenario has already received attention in a 
recent CDF search [3J, considering only a non-boosted 
region with conventional reconstruction. This study fo- 
cused on signals of six jets, and employed sophisticated 
techniques for reducing the background, such as three-jet 
correlations and vertex position tracking. Yet it turned 
out to be practically insensitive to a possible gluino con- 
tribution. 

Another interesting recent work [2] adopted a similar 
approach to that of [H [S] in search of an RPV gluino 
at the Tevatron, though it was based only on MC sim- 
ulations rather than real data. It required two boosted 
jets (pt > 350 Gev) with masses close to each other and 
further applied a certain jet substructure cut. This ap- 
proach was found to be quite sensitive to a gluino signal. 

We estimate the gluino signal as a function of its 
mass using both Herwig and MadGraph/MadEvcnt with 
Pythia. The results are presented in Table [H] (note that 
there is some difference between the two MC tools, yet it 
is evident that the ratio of these cross sections to that of 
top pair production is constant). Also shown in the table 
is the acceptance, which is the percentage of events that 
pass all the cuts out of the overall sample of one boosted 
jet from the corresponding particle. It is evident that the 
cross section is indeed in the ballpark of the observed ex- 
cess 4 . Since we do not try to provide a precise fit of the 
signal, NLO corrections are not expected to change this 
statement (and in any case they should be small because 
of the strong cut - see e.g. Figure 9 in [27]). Moreover, 
it is interesting that the gluino contribution enhances the 
large Pf region of the distribution, as shown in Fig. [2] 

As an outlook to the near future, we point that the 
LHC should be able to test whether indeed there is a 
deviation from the SM in this type of signal, possibly 
even with only 0(1) fb^ 1 . It would thus be interesting 
to adapt the search of ultra-massive highly-boosted jets 
to the LHC. 
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4 Very recently a new lower bound of 144 GeV for the gluino mass 
appeared |26| . thus excluding the first line in Table \U\ 
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Acceptance 


Acceptance 


Cross section [fb] 


Cross section [fb] 


Cross section [fb] 


Particle 


Herwig 


MG/Pythia 


Herwig 


MG/Pythia 


MG/Pythia 






no matching 




no matching 


with matching 


Gluino m- g = 130 GeV 


0.43 


0.49 


15 


17 


18 


Gluino m g = 150 GeV 


0.52 


0.50 


13 


14 


15 


Gluino m g = 170 GeV 


0.49 


0.48 


11 


12 


12 


Hadronic top quark pair 


0.47 


0.46 


1.6 


1.7 


1.8 



TABLE II: The gluino cross section and acceptance for masses of 130, 150 and 170 GeV, computed by both Herwig and 
MadGraph with Pythia. For the latter we also add a calculation including an extra jet with MLM matching. As a comparison, 
we present the hadronic top cross section and acceptance. 
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